Improved Tool Support for Machine-Code Decompilation in HOL4
نویسنده
چکیده
The HOL4 interactive theorem prover provides a sound logical environment for reasoning about machine-code programs. The rigour of HOL’s LCF-style kernel naturally guarantees very high levels of assurance, but it does present challenges when it comes implementing efficient proof tools. This paper presents improvements that have been made to our methodology for soundly decompiling machine-code programs to functions expressed in HOL logic. These advancements have been facilitated by the development of a domain specific language, called L3, for the specification of Instruction Set Architectures (ISAs). As a result of these improvements, decompilation is faster (on average by one to two orders of magnitude), the instruction set specifications are easier to write, and the proof tools are easier to maintain. Traditional formal software verification has primarily focussed on developing and using formalisations of high-level programming languages, with formal reasoning occurring at the level of the programmer. However, in some high-assurance applications, such language formalisations could be too abstract or unrealistic, and the trustworthiness of compilers may come into play. These issues can be addressed by using a verified compiler, see [9] and [8]. However, an alternative approach is to work directly with machine-code, which could be generated by any compiler for a particular platform. This has the advantage that one does not have to formalise the semantics of high-level source languages, and formal reasoning relates more directly to the code that is actually being run. The success of this approach hinges upon the ability to accurately formalise a processor’s instruction set and on the ability to overcome the challenges of working with lowlevel code, which is less structured and replete with platform specific details. To this end, Magnus Myreen has developed an approach for soundly decompiling machine-code using the HOL4 interactive theorem prover, see [12]. Commercial instruction set architectures are large and complex, with reference manuals running to thousands of pages in length. We use the L3 domain specific language to formally specify ISAs, see [4]. Details of our current ISA formalisations can be found in Sections 2 and 8. Each architecture has its own idiosyncrasies, which must be accommodated when writing proof automation for a theorem prover. In this paper our main working instruction set is ARMv7-A 1 For example, 2736 pages for ARMv7-A, 5242 pages for ARM-v8 (which contains a full description of the legacy AArch32 mode) and 3020 pages for x86.
منابع مشابه
Decompilation into logic - Improved
This paper presents improvements to a technique which aids verification of machine-code programs. This technique, called decompilation into logic, allows the verifier to only deal with tractable extracted models of the machine code rather than the concrete code itself. Our improvements make decompilation simpler, faster and more generally applicable. In particular, the new technique allows the ...
متن کاملTrustworthy decompilation: extracting models of machine code inside an ITP
Modern processors support a large numbers of instructions and a multitude of features; as a result, detailed formal models of real instruction set architectures (ISAs) are long and hard to understand. Established approaches for proving functional properties on top of these models tie proofs to a specific model and require expert knowledge of the underlying model and substantial manual effort of...
متن کاملPreprocessing of Binary Executable Files Towards Retargetable Decompilation
The goal of retargetable machine-code decompilation is to analyze and reversely translate platform-dependent executable files into a high level language (HLL) representation. This process can be used for many different purposes, such as legacy code reengineering, malware analysis, etc. Retargetable decompilation is a complex task that must deal with a lot of different platform-specific features...
متن کاملDecompilation of Java bytecode to Prolog by partial evaluation
Reasoning about Java bytecode (JBC) is complicated due to its unstructured control-flow, the use of three-address code combined with the use of an operand stack, etc. Therefore, many static analyzers and model checkers for JBC first convert the code into a higher-level representation. In contrast to traditional decompilation, such representation is often not Java source, but rather some interme...
متن کاملCompiling HOL4 to Native Code
We present a framework for extracting and compiling proof tools and theories from a higher order logic theorem prover, so that the theorem prover can be used as a platform for supporting reasoning in other applications. The framework is demonstrated on a small application that uses HOL4 to find proofs of arbitrary first order logic formulas.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015